Trustworthy Artificial Intelligence

Delft University of Technology

Arie van Deursen
Cynthia C. S. Liem

May 13, 2024

Overview

  • Question: How can we faithfully explain predictions of opaque machine learning models?
  • Methods: Counterfactual Explanations, Algorithmic Recourse, Probabilistic Machine Learning, Energy-Based Models, Conformal Prediction, …
  • Applications: Mostly finance and economics but also images and natural language.
  • Tools: I am a Julia developer and the founder of Taija, an organization for trustworthy AI in Julia.

Counterfactual Explanations

\[ \begin{aligned} \min_{\mathbf{Z}^\prime \in \mathcal{Z}^L} \{ {\text{yloss}(M_{\theta}(f(\mathbf{Z}^\prime)),\mathbf{y}^+)} + \lambda {\text{cost}(f(\mathbf{Z}^\prime)) } \} \end{aligned} \]

Counterfactual Explanations (CE) explain how inputs into a model need to change for it to produce different outputs.

📜 Altmeyer, Deursen, et al. (2023) @ JuliaCon 2022.

Figure 1: Gradient-based counterfactual search.

Pick your Poison

All of these counterfactuals are valid explanations for the model’s prediction.

Which one would you pick?

Figure 2: Turning a 9 into a 7: Counterfactual explanations for an image classifier produced using Wachter (Wachter, Mittelstadt, and Russell 2017), Schut (Schut et al. 2021) and REVISE (Joshi et al. 2019).

ECCCos from the Black-Box

📜 Altmeyer, Farmanbar, et al. (2023) @ AAAI 2024

Key Idea

Use the hybrid objective of joint energy models (JEM) and a model-agnostic penalty for predictive uncertainty: Energy-Constrained (\(\mathcal{E}_{\theta}\)) Conformal (\(\Omega\)) Counterfactuals (ECCCo).

ECCCo objective1:

\[ \begin{aligned} & \min_{\mathbf{Z}^\prime \in \mathcal{Z}^L} \{ {L_{\text{clf}}(f(\mathbf{Z}^\prime);M_{\theta},\mathbf{y}^+)}+ \lambda_1 {\text{cost}(f(\mathbf{Z}^\prime)) } \\ &+ \lambda_2 \mathcal{E}_{\theta}(f(\mathbf{Z}^\prime)|\mathbf{y}^+) + \lambda_3 \Omega(C_{\theta}(f(\mathbf{Z}^\prime);\alpha)) \} \end{aligned} \]

Figure 3: Gradient fields and counterfactual paths for different generators.

Faithful Counterfactuals

Figure 4: Turning a 9 into a 7. ECCCo applied to MLP (a), Ensemble (b), JEM (c), JEM Ensemble (d).

ECCCo generates counterfactuals that

  • faithfully represent model quality (Figure 4).
  • achieve state-of-the-art plausibility (Figure 5).
Figure 5: Results for different generators (from 3 to 5).

Dynamics of Counterfactuals

📜 Altmeyer, Angela, et al. (2023) @ SaTML 2023.

Figure 6: Illustration of external cost of individual recourse.

Spurious Sparks of AGI

📜 In Altmeyer et al. (2024) @ ECONDAT 2024, we challenge the idea that the finding of meaningful patterns in latent spaces of large models is indicative of AGI.

Figure 7: Inflation of prices or birds? It doesn’t matter!

Taija

  • Work presented @ JuliaCon 2022, 2023, 2024.
  • Running project @ Google Summer of Code 2024.
  • Total of three software projects @ TU Delft.

Trustworthy AI in Julia: github.com/JuliaTrustworthyAI

Trustworthy AI in Julia: github.com/JuliaTrustworthyAI

References

Altmeyer, Patrick, Giovan Angela, Aleksander Buszydlik, Karol Dobiczek, Arie van Deursen, and Cynthia CS Liem. 2023. “Endogenous Macrodynamics in Algorithmic Recourse.” In 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 418–31. IEEE.
Altmeyer, Patrick, Andrew M. Demetriou, Antony Bartlett, and Cynthia C. S. Liem. 2024. “Position Paper: Against Spurious Sparks-Dovelating Inflated AI Claims.” https://arxiv.org/abs/2402.03962.
Altmeyer, Patrick, Arie van Deursen, et al. 2023. “Explaining Black-Box Models Through Counterfactuals.” In Proceedings of the JuliaCon Conferences, 1:130. 1.
Altmeyer, Patrick, Mojtaba Farmanbar, Arie van Deursen, and Cynthia C. S. Liem. 2023. “Faithful Model Explanations Through Energy-Constrained Conformal Counterfactuals.” https://arxiv.org/abs/2312.10648.
Grathwohl, Will, Kuan-Chieh Wang, Joern-Henrik Jacobsen, David Duvenaud, Mohammad Norouzi, and Kevin Swersky. 2020. “Your Classifier Is Secretly an Energy Based Model and You Should Treat It Like One.” In International Conference on Learning Representations.
Joshi, Shalmali, Oluwasanmi Koyejo, Warut Vijitbenjaronk, Been Kim, and Joydeep Ghosh. 2019. “Towards Realistic Individual Recourse and Actionable Explanations in Black-Box Decision Making Systems.” https://arxiv.org/abs/1907.09615.
Schut, Lisa, Oscar Key, Rory Mc Grath, Luca Costabello, Bogdan Sacaleanu, Yarin Gal, et al. 2021. “Generating Interpretable Counterfactual Explanations By Implicit Minimisation of Epistemic and Aleatoric Uncertainties.” In International Conference on Artificial Intelligence and Statistics, 1756–64. PMLR.
Stutz, David, Krishnamurthy, Dvijotham, Ali Taylan Cemgil, and Arnaud Doucet. 2022. “Learning Optimal Conformal Classifiers.” https://arxiv.org/abs/2110.09192.
Wachter, Sandra, Brent Mittelstadt, and Chris Russell. 2017. “Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR.” Harv. JL & Tech. 31: 841. https://doi.org/10.2139/ssrn.3063289.